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(57) Abstract 



The present invention includes software and hardware components to enable digital data communication over standard telephone lines. 
The present invention converts analog voice signals to digital data, compresses that data and places the compressed speech data into packets 
for transfer over the telephone lines to a remote site. A voice control digital signal processor (DSP) operates to use one of a plurality of 
speech compression algorithms which produce a scaleable amount of compression. The rate of compression is inversely proportional to 
the quality of the speech the compression algorithm is able to reproduce. The higher the compression, the lower the reproduction quality. 
The selection of the rate of compression is dependent on such factors as the speed or data bandwidth on the communications connection 
between the two sites, the data demand between the sites and amount of silence detected in the speech signal. The voice compression rate 
is dynamically changed as the aforementioned factors change. A negotiation handshake protocol is described which enables the two sites 
to negotiate the compression rate based on such factors. 
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Field of the Invention 
The present invention relates to communications systems and in 
particular to modem communications having simultaneous digitized voice and 
data capabilities. 
10 Background of the Invention 

Simultaneous voice and modem data transmitted over the same 
communications link between two sites has been accomplished in several 
ways. The most common communications link used between two sites is the 
telephone line. The most common data handling equipment to communicate 
15 over a communications link is the computer modem which modulates digital 
data onto a carrier for transmission in the voice band of the telephone line. A 
wide variety of modulation standards have been promulgated by such 
international groups as the CCITT for communication in the voice band. The 
data bandwidth for such modulation standards is typically fixed and the 
20 throughput rate of data is also assumed to be fixed. 

In some modulations standards, there are provisions for 
changing the modulation data rate based on the quality of the communications 
link. For example, in a noisy telephone line, a 9600 baud modulation rate 
may have such a high bit error rate that the modulation must be changed to a 
25 2400 baud connection. This is done in a handshake communication protocol 
between the two sites when the communications link simply cannot support 
the higher rate. 

There is a need in the art, however, for an efficient and cost 
effective way of maximizing bandwidth over the communications link 
30 between two sites to enable the simultaneous transmission of voice and data 
There is a need, therefore, to negotiate the data bandwidth between the sites, 
negotiate the compression rate for the voice compression algorithms used to 
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compress the voice and there is a need to allocate and reallocate the ratio of 
compress voice to digital data transmitted over the communications link. 

Summary of the Invention 
The present invention solves the aforementioned problems and 

5 shortcoming of the existing art and solves other problems not listed above 
which will become apparent to those skilled in the art upon reading and 
understanding the present specification and claims. The present invention 
describes a voice over data modem which allows the operator to 
simultaneously transmit voice and data communication to a remote site. This 

10 voice over data function dynamically allocates data bandwidth over the 

telephone line depending on the demands of the voice grade digitized signal 
and the modulation speed of the communication link between the two sites. 

The present invention includes software and hardware 
components to enable digital data communication over standard telephone 

15 lines. The present invention converts analog voice signals to digital data, 
compresses that data and places the compressed speech data into packets for 
transfer over the telephone lines to a remote site. A voice control digital 
signal processor (DSP) operates to use one of a plurality of speech 
compression algorithms which produce a scaleable amount of compression. 

20 The rate of compression is inversely proportional to the quality of the speech 
the compression algorithm is able to reproduce. The higher the compressioa 
the lower the reproduction quality. The selection of the rate of" compression is 
dependant on such factors as the speed or data bandwidth on the 
communications connection between the two sites, the data demand between 

25 the sites and amount of silence detected in the speech signal. The voice 
compression rate is dynamically changed as the aforementioned factors 
change. A negotiation handshake protocol is described which enables the two 
sites to negotiate the compression rate based on such factors. 

Description of the Drawings 

30 In the drawings, where like numerals describe like components 

throughout the several views, 
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Figure 1 shows the telecommunications environment within 
which the present may operate in several of the possible modes of 
communication; 

Figure 2 is a block diagram of the hardware components of the 

5 present system; 

Figure 3 is a detailed function flow diagram of the speech 

compression algorithm; 

Figure 4 is a detailed function flow diagram of the speech 

decompression algorithm; 
10 Figure 5 is a signal flow diagram of the speech compression 

algorithm; 

Figure 6 is a signal flow diagram of the speech compression 
algorithm showing details of the code book synthesis; and 

Figure 7 is a detailed function flow diagram of the voice/data 

15 multiplexing function. 

Prta jlftH Descriptio n of the Preferred Embodiments 
In the following detailed description of the preferred 
embodiment, reference is made to the accompanying drawings which form a 
part hereof, and in which is shown by way of illustration specific 
20 embodiments in which the inventions may be practiced. These embodiments 
are described in sufficient detail to enable those skilled in the art to practice 
the invention, and it is to be understood that other embodiments may be 
utilized and that structural changes may be nufle without departing from the 
spirit and scope of the present inventions. The following detailed description 
25 is, therefore, not to be taken in a limiting sense, and the scope of the present 
inventions is defined by the appended claims. 

Figure 1 shows a typical arrangement for the use of the present 
system. Personal computer 10 is running the software components of the 
present system while the hardware components 20 include the data 
30 communication equipment and telephone headset. Hardware components 20 
communicate over a standard telephone line 30 to one of a variety of remote 
sites. One of the remote sites may be equipped with the present system 
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including hardware components 20a and software components running on 
personal computer 10a In one alternative use, the local hardware components 
20 may be communicating over standard telephone line 30 to facsimile 
machine 60. In another alternative use, the present system may be 
5 communicating over a standard telephone line 30 to another personal 
computer 80 through a remote modem 70. In another alternative use, the 
present system may be communicating over a standard telephone line 30 to a 
standard telephone 90. Those skilled in the art will readily recognize the wide 
variety of communication interconnections possible with the present system by 
10 reading and understanding the following detailed description. 



by the assignee, Multi-Tech Systems, Inc. The software component operating 
on a personal computer is sold under the commercial trademark of 

1 5 MultiExpressPCS™ personal communications software while the hardware 
component of the present system is sold under the commercial name of 
MultiModemPCS™, Intelligent Personal Communications System Modem. In 
the preferred embodiment the software component runs under Microsoft* 
Windows™ however those skilled in the art will readily recognize that the 

20 present system is easily adaptable to run under any single or multi-user, single 
or multi-window operating system. 



which includes hardware and software components. The system allows the 
user to connect to remote locations equipped with a similar system or with 
25 modems, facsimile machines or standard telephones over a single analog 
telephone line. The software component of the present system includes a 
number of modules which are described in more detail below. 



conventional or sophisticated telephone system. The system converts voice 
30 into a digital signal so that it can be transmitted or stored with other digital 
data, like computer information. The telephone function supports PBX and 
Centrex features such a call waiting, call forwarding, caller ID and three-way 



General Overview 
The present inventions are embodied in a commercial product 



The present system is a multifunction communication system 



The telephone module allows the system to operate as a 
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calling. This module also allows the user to mute, hold or record a 
conversation. The telephone module enables the handset, headset or hands- 
free speaker telephone operation of the hardware component. It includes on- 
screen push button dialing, speed-dial of stored numbers and digital recording 
5 of two-way conversations. 

The voice mail portion of the present system allows this system 
to operate as a telephone answering machine by storing voice messages as 
digitized voice files along with a time/dale voice stamp. The digitized voice 
files can be saved and sent to one or more destinations immediately or at a 
10 later time using a queue scheduler. The user can also listen to, forward or 
edit the voice messages which have been received with a powerful digital 
voice editing component of the present system. This module also creates 
queues for outgoing messages to be sent at preselected times and allows the 
users to create outgoing messages with the voice editor. 
15 The fax manager portion of the present system is a queue for 

incoming and outgoing facsimile pages. In the preferred embodiment of the 
present system, this function is tied into the Windows "print" command once 
the present system has been installed. This feature allows the user to create 
faxes from any Windows®-based document that uses the "print" command. 
20 The fax manager function of the present system allows the user to view 

queued faxes which are to be sent or which have been received This module 
creates queues for outgoing faxes to be sent at preselected times and logs 
incoming faxes with time/date stamps. 

The multi-media mail function of the present system is a utility 
25 which allows the user to compose documents that include text, graphics and 
voice messages using the message composer function of the present system, 
described more fully below. The multi-media mail utility of the present 
system allows the user to schedule messages for transmittal and queues up the 
messages that have been received so that can be viewed at a later time. 
30 The show and tell function of the present system allows the 

user to establish a data over voice (DOV) communications session. When the 
user is transmitting data to a remote location similarly equipped, the user is 
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able to talk to the person over the telephone line while concurrently 
transferring the data. This voice over data function is accomplished in the 
hardware components of the present system. It digitizes the voice and 
transmits it in a dynamically changing allocation of voice data and digital data 
5 multiplexed in the same transmission. The allocation at a given moment is 
selected depending on the amount of voice digital information required to be 
transferred. Quiet voice intervals allocate greater space to the digital data 
transmission. 

The terminal function of the present system allows the user to 

10 establish a data communications session with another computer which is 
equipped with a modem but which is not equipped with the present system. 
This feature of the present system is a Windows™-based data communications 
program that reduces the need for issuing "AT" commands by providing menu 
driven and "pop-up" window alternatives. 

1 5 The address book function of the present system is a database 

that is accessible from all the other functions of the present system. This 
database is created by the user inputting destination addresses and telephone 
numbers for data communication, voice mail, facsimile transmission, modem 
communication and the like. The address book function of the present system 

20 may be utilized to broadcast communications to a wide variety of recipients. 
Multiple linked databases have separate address books for different groups 
and different destinations may be created by the users. The address book 
function includes a textual search capability which allows fast and efficient 
location of specific addresses as described more fully below. 

25 Hardware Components 

Figure 2 is a block diagram of the hardware components of the 
present system corresponding to reference number 20 of Figure 1 . These 
components form the link between the user, the personal computer running the 
software component of the present system and the telephone line interface. 

30 As will be more fully described below, the interface to the hardware 
components of the present system is via a serial communications port 
connected to the personal computer. The interface protocol is well ordered 
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and defined such that other software systems or programs running on the 
personal computer may be designed and implemented which would be capable 
of controlling the hardware components shown in Figure 2 by using the 
control and communications protocol defined below. 

5 In the preferred embodiment of the present system, three 

alternate telephone interfaces are available: the telephone handset 301, a 
telephone headset 302, and a hands-free microphone 303 and speaker 304. 
Regardless of the telephone interface, the three alternative interfaces connect 
to the digital telephone coder-decoder (CODEC) circuit 305. 

10 The digital telephone CODEC circuit 305 interfaces with the 

voice control digital signal processor (DSP) circuit 306 which includes a voice 
control DSP and CODEC. This circuit does digital to analog (D/A) 
conversion, analog to digital (A/D) conversion, coding/decoding, gain control 
and is the interface between the voice control DSP circuit 306 and the 

15 telephone interface. The CODEC of me voice control circuit 306 transfers 
digitized voice information in a compressed format to multiplexor circuit 310 
to analog telephone line interface 309. 

The CODEC of the voice control circuit 306 is actually an 
integral component of a voice control digital signal processor integrated 

20 circuit, as described more fully below. The voice control DSP of circuit 306 
controls the digital telephone CODEC circuit 305, performs voice compression 
and echo cancellation. 

Multiplexor (MUX) circuit 310 selects between the voice 
control DSP circuit 306 and the data pump DSP circuit 31 1 for transmission 

25 of information on the telephone line through telephone line interface circuit 
309. 

The data pump circuit 311 also includes a digital signal 
processor (DSP) and a CODEC for communicating over the telephone line 
interface 309 through MUX circuit 310. The data pump DSP and CODEC of 
30 circuit 311 performs functions such as modulation, demodulation and echo 
cancellation to communicate over the telephone line interface 309 using a 
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plurality of telecommunications standards including FAX and modem 
protocols. 

The main controller circuit 313 controls the DSP data pump 
circuit 3 1 1 and the voice control DSP circuit 306 through serial input/output 
5 and clock timer control (SIO/CTC) circuits 312 and dual port RAM circuit 
308 respectively. The main controller circuit 313 communicates with the 
voice control DSP 306 through dual port RAM circuit 308. In this fashion 
digital voice data can be read and written simultaneously to the memory 
portions of circuit 308 for high speed communication between the user 
10 (through interfaces 301, 302 or 303/304) and the personal computer connected 
to serial interface circuit 315 and the remote telephone connection connected 
through the telephone line attached to line interface circuit 309. 

As described more folly below, the main controller circuit 313 
includes, in the preferred embodiment, a microprocessor which controls the 
15 functions and operation of all of the hardware components shown in Figure 2. 
The main controller is connected to RAM circuit 316 and an programmable 
and electrically erasable read only memory (EEPROM) circuit 317. The 
EEPROM circuit 317 includes non-volatile memory in which the executable 
control programs for the voice control DSP circuits 306 and the main 
20 controller circuits 313 operate. 

The RS232 serial interface circuit 315 communicates to the 
serial port of the personal computer which is running the software components 
of the present system. The RS232 serial interface circuit 315 is connected to 
a serial input/output circuit 314 with main controller circuit 313. SIO circuit 
25 314 is in the preferred embodiment, a part of SIO/CTC circuit 312. 

Functional Operation of th e Hardware Components 
Referring once again to Figure 2, the multiple and selectable 
functions described in conjunction with Figure 2 are all implemented in the 
hardware components of Figure 2. Each of these functions will be discussed 
30 in turn. 

The telephone function 1 15 is implemented by the user either 
selecting a telephone number to be dialed from the address book 127 or 
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manually selecting the number through the telephone menu on the personal 
computer. The telephone number to be dialed is downloaded from the 
personal computer over the serial interface and received by main controller 
313. Main controller 313 causes the data pump DSP circuit 311 to seize the 
5 telephone line and transmit the DTMF tones to dial a number. DSP 306 
receives commands from the personal computer via main controller 313 to 
configure the digital telephone CODEC circuit 305 to enable either the 
handset 301 operation, the microphone 303 and speaker 304 operation or the 
headset 302 operation. A telephone connection is established through the 
10 telephone line interface circuit 309 and communication is enabled. The user's 
analog voice is transmitted in an analog fashion to the digital telephone 
CODEC 305 where it is digitized. The digitized voice patterns are passed to 
the voice control circuit 306 where echo cancellation is accomplished, the 
digital voice signals are reconstructed into analog signals and passed through 
15 multiplexor circuit 310 to the telephone line interface circuit 309 for analog 
transmission over the telephone line. The incoming analog voice from the 
telephone connection through telephone connection circuit 309 is passed to the 
integral CODEC of the voice control circuit 306 where it is digitized. The 
digitized incoming voice is then passed to digital telephone CODEC circuit 
20 305 where it is reconverted to an analog signal for transmission to the 

selected telephone interface (either the handset 301, the microphone/speaker 
303/304 or the headset 302). Voice Control DSP circuit 306 is programmed 
to perform echo cancellation to avoid feedback and echoes between 
transmitted and received signals, as is more fully described below. 
25 In the voice mail function mode of the present system, voice 

messages may be stored for later transmission or the present system may 
operate as an answering machine receiving incoming messages. For storing 
digitized voice, the telephone interface is used to said the analog speech 
patterns to the digital telephone CODEC circuit 305. Circuit 305 digitizes the 
30 voice patterns and passes them to voice control circuit 306 where the digitized 
voice patterns are digitally compressed. The digitized and compressed voice 
patterns are passed through dual port ram circuit 308 to the main controller 
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circuit 313 where they are transferred through the serial interface to the 
personal computer using a packet protocol defined below. The voice patterns 
are then stored on the disk of the personal computer for later use in multi- 
media mail, for voice mail, as a pre-recorded answering machine message or 

5 for later predetermined transmission to other sites. 

For the present system to operate as an answering machine, the 
hardware components of Figure 2 are placed in answer mode. An incoming 
telephone ring is detected through the telephone line interface circuit 309 and 
the main controller circuit 313 is alerted which passes the information off to 

10 the personal computer through the RS232 serial interface circuit 315. The 
telephone line interface circuit 309 seizes the telephone line to make the 
telephone connection. A pre-recorded message may be sent by the personal 
computer as compressed and digitized speech through the RS232 interface to 
the main controller circuit 313. The compressed and digitized speech from 

15 the personal computer is passed from main controller circuit 313 through dual 
port ram circuit 308 to the voice control DSP circuit 306 where it is 
uncompressed and converted to analog voice patterns. These analog voice 
patterns are passed through multiplexor circuit 310 to the telephone line 
interface 309 for transmission to the caller. Such a message may invite the 

20 caller to leave a voice message at the sound of a tone. The incoming voice 
messages are received through telephone line interface 309 and passed to 
voice control circuit 306. The analog voice patterns are digitized by the 
integral CODEC of voice control circuit 306 and the digitized voice patterns 
are compressed by the voice control DSP of the voice control circuit 306. 

25 The digitized and compressed speech patterns are passed through dual port 
ram circuit 308 to the main controller circuit 313 where they are transferred 
using packet protocol described below through the RS232 serial interface 315 
to the personal computer for storage and later retrieval. In this fashion the 
hardware components of Figure 2 operate as a transmit and receive voice mail 

30 system for implementing the voice mail function 1 17 of the present system. 

The hardware components of Figure 2 may also operate to 
facilitate the fax manager function 1 19 of Figure 2. In fax receive mode, an 
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incoming telephone call will be detected by a ring detect circuit of the 
telephone line interface 309 which will alert the main controller circuit 313 to 
the incoming call. Main controller circuit 313 will cause line interface circuit 
309 to seize the telephone line to receive the call. Main controller circuit 313 
5 will also concurrently alert the operating programs on the personal computer 
through the RS232 interface using the packet protocol described below. Once 
the telephone line interface seizes the telephone line, a fax carrier tone is 
transmitted and a return tone and handshake is received from the telephone 
line and detected by the data pump circuit 311. The reciprocal transmit and 
10 receipt of the fax tones indicates the imminent receipt of a facsimile 
transmission and the main controller circuit 313 configures the hardware 
components of Figure 2 for the receipt of that information. The necessary 
handshaking with the remote facsimile machine is accomplished through the 
data pump 31 1 under control of the main controller circuit 313. The 
1 5 incoming data packets of digital facsimile data are received over the telephone 
line interface and passed through data pump circuit 31 1 to main controller 
circuit 313 which forwards the information on a packet basis (using the packet 
protocol described more fully below) through the serial interface circuit 315 to 
the personal computer for storage on disk. Those skilled in the art will 
20 readily recognize that the FAX data could be transferred from the telephone 
line to the personal computer using the same path as the packet transfer 
except using the normal AT stream mode. Thus the incoming facsimile is 
automatically received and stored on the personal computer through the 
hardware components of Figure 2. 
25 A facsimile transmission is also facilitated by the hardware 

components of Figure 2. The transmission of a facsimile may be immediate 
or queued for later transmission at a predetermined or preselected time. 
Control packet information to configure the hardware components to send a 
facsimile are sent over the RS232 serial interface between the personal 
30 computer and the hardware components of Figure 2 and are received by main 
controller circuit 313. The data pump circuit 311 then dials the recipient's 
telephone number using DTMF tones or pulse dialing over the telephone line 
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interface circuit 309. Once an appropriate connection is established with the 
remote facsimile machine, standard facsimile handshaking is accomplished by 
the data pump circuit 311. Once the facsimile connection is established, the 
digital facsimile picture information is received through the data packet 
5 protocol transfer over serial line interface circuit 315, passed through main 
controller circuit 313 and data pump circuit 31 1 onto the telephone line 
through telephone line interface circuit 309 for receipt by the remote facsimile 
machine. 

The operation of the multi-media mail function 121 of Figure 2 

10 is also facilitated by the hardware components of Figure 2. A multimedia 
transmission consists of a combination of picture information, digital data and 
digitized voice information. For example, the type of multimedia information 
transferred to a remote site using the hardware components of Figure 2 could 
be the multimedia format of the Microsoft® Multimedia Wave® format with 

15 the aid of an Intelligent Serial Interface (ISI) card added to the personal 
computer. The multimedia may also be the type of multimedia information 
assembled by the software component of the present system which is 
described more fully below. 

The multimedia package of information including text, graphics 

20 and voice messages (collectively called the multimedia document) may be 
transmitted or received through the hardware components shown in Figure 2. 
For example, the transmission of a multimedia document through the 
hardware components of Figure 2 is accomplished by transferring the 
multimedia digital information using the packet protocol described below over 

25 the RS232 serial interface between the personal computer and the serial line 
interface circuit 315. The packets are then transferred through main controller 
circuit 313 through the data pump circuit 311 on to the telephone line for 
receipt at a remote site through telephone line interface circuit 309. In a 
similar fashion, the multimedia documents received over the telephone line 

30 from the remote site are received at the telephone line interface circuit 309 ? 
passed through the data pump circuit 311 for receipt and forwarding by the 
main controller circuit 313 over the serial line interface circuit 315. 
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The show and tell function 123 of the present system allows the 
user to establish a data over voice communication session. In this mode of 
operation, full duplex data transmission may be accomplished simultaneously 
with the voice communication between both sites. This mode of operation 
5 assumes a like- configured remote site. The hardware components of the 
present system also include a means for sending voice/data over cellular links. 
The protocol used for transmitting multiplexed voice and data include a 
supervisory packet described more fully below to keep the link established 
through the cellular link. This supervisory packet is an acknowledgement that 
10 the link is still up. The supervisory packet may also contain link information 
to be used for adjusting various link parameters when needed. This 
supervisory packet is sent every second when data is not being sent and if the 
packet is not acknowledged after a specified number of attempts, the protocol 
would then give an indication that the cellular link is down and then allow the 
15 modem to take action. The action could be for example; change speeds, 
retrain, or hang up. The use of supervisory packets is a novel method of 
maintaining inherently intermittent cellular links when transmitting 
multiplexed voice and data 

The voice portion of the voice over data transmission of the 
20 show and tell function is accomplished by receiving the user's voice through 
the telephone interface 301, 302 or 303 and the voice information is digitized 
by the digital telephone circuit 305. The digitized voice information is passed 
to the voice control circuit 306 where the digitized voice information is 
compressed using a voice compression algorithm described more fully below. 
25 The digitized and compressed voice information is passed through dual port 
RAM circuit 308 to the main controller circuit 313. During quiet periods of 
the speech, a quiet flag is passed from voice control circuit 306 to the main 
controller 313 through a packet transfer protocol described below by a dual 

port RAM circuit 308. 
30 Simultaneous with the digitizing compression and packetizing 

of the voice information is the receipt of the packetized digital information 
from the personal computer over interface line circuit 315 by main controller 
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circuit 313. Main controller circuit 313 in the show and tell function of the 
present system must efficiently and effectively combine the digitized voice 
information with the digital information for transmission over the telephone 
line via telephone line interface circuit 309. As described above and as 
5 described more fully below, main controller circuit 313 dynamically changes 
the amount of voice information and digital information transmitted at any 
given period of time depending upon the quiet times during the voice 
transmissions. For example, during a quiet moment where there is no speech 
information being transmitted, main controller circuit 313 ensures that a 

10 higher volume of digital data information be transmitted over the telephone 
line interface in lieu of digitized voice information. 

Also, as described more fully below, the packets of digital data 
transmitted over the telephone line interface with the transmission packet 
protocol described below, requires 100 percent accuracy in the transmission of 

1 5 the digital data, but a lesser standard of accuracy for the transmission and 
receipt of the digitized voice information. Since digital information must be 
transmitted with 100 percent accuracy, a corrupted packet of digital 
information received at the remote site must be re-transmitted. A 
retransmission signal is communicated back to the local site and the packet of 

20 digital information which was corrupted during transmission is retransmitted. 
If the packet transmitted contained voice data, however, the remote site uses 
the packets whether they were corrupted or not as long as the packet header 
was intact. If the header is corrupted, the packet is discarded Thus, the 
voice information may be corrupted without requesting retransmission since it 

25 is understood that the voice information must be transmitted on a real time 
basis and the corruption of any digital information of the voice signal is not 
critical. In contrast to this the transmission of digital data is critical and 
retransmission of corrupted data packets is requested by the remote site. 

The transmission of the digital data follows the CCITT V.42 

30 standard, as is well known in the industry and as described in the CCITT Blue 
Book, volume VIII entitled Data Communication over the Telephone Network, 
1989. The voice data packet information also follows the CCITT V.42 
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standard, but uses a different header format so the receiving site recognizes 
the difference between a data packet and a voice packet. The voice packet is 
distinguished from a data packet by using undefined bits in the header (80 
hex) of the V.42 standard. The packet protocol for voice over data 

5 transmission during the show and tell function of the present system is 
described more fully below. 

Since the voice over dat? communication with the remote site is 
full-duplex, incoming data packets and incoming voice packets are received 
by the hardware components of Figure 2. The incoming data packets and 

10 voice packets are received through the telephone line interface circuit 309 and 
passed to the main controller circuit 313 via data pump DSP circuit 311. The 
incoming data packets are passed by the main controller circuit 313 to the 
serial interface circuit 315 to be passed to the personal computer. The 
incoming voice packets are passed by the main controller circuit 313 to the 

15 dual port RAM circuit 308 for receipt by the voice control DSP circuit 306. 
The voice packets are decoded and the compressed digital information therein 
is uncompressed by the voice control DSP of circuit 306. The uncompressed 
digital voice information is passed to digital telephone CODEC circuit 305 
where it is reconverted to an analog signal and retransmitted through the 

20 telephone line interface circuits. In this fashion full-duplex voice and data 
transmission and reception is accomplished through the hardware components 
of Figure 2 during the show and tell functional operation of the present 
system. 

Terminal operation 125 of the present system is also supported 
25 by the hardware components of Figure 2. Terminal operation means that the 
local personal computer simply operates as a "dumb" terminal including file 
transfer capabilities. Thus no local processing takes place other than the 
handshaking protocol required for the operation of a dumb terminal. In 
terminal mode operation, the remote site is assumed to be a modem connected 
30 to a personal conputer but the remote site is not necessarily a site which is 
configured according to the present system. In terminal mode of operation, 
the command and data information from personal computer is transferred over 
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the RS232 serial interface circuit 315, forwarded by main controller circuit 
313 to the data pump circuit 3 1 1 where the data is placed on the telephone 
line via telephone line interface circuit 309. 

In a reciprocal fashion, data is received from the telephone line 
5 over telephone line interface circuit 309 and simply forwarded by the data 
pump circuit 31 1, the main controller circuit 313 over the serial line interface 
circuit 315 to the personal computer. 

As described above, and more fully below, the address book 
function of the present system is primarily a support function for providing 
10 telephone numbers and addresses for the other various functions of the present 
system. 

Packet Protocol Between the PC 

and the Hardware Component 
A special packet protocol is used for communication between 

15 the hardware components 20 and the personal computer (PC) 10. The 
protocol is used for transferring different types of information between the 
two devices such as the transfer of DATA, VOICE, and QUALIFIED 
information. The protocol also uses the BREAK as defined in CCITT X.28 as 
a means to maintain protocol synchronization. A description of this BREAK 

20 sequence is also described in the Statutory Invention Registration entitled 

"ESCAPE METHODS FOR MODEM COMMUNICATIONS", to Timothy D. 
Gunn filed January 8, 1993. 

The protocol has two modes of operation. One mode is packet 
mode and the other is stream mode. The protocol allows mixing of different 

25 types of information into the data stream without having to physically switch 
modes of operation. The hardware component 20 will identify the packet 
received from the computer 10 and perform the appropriate action according 
to the specifications of the protocol. If it is a data packet then the controller 
313 of hardware component 20 would send it to the data pump circuit 311. If 

30 the packet is a voice packet, then the controller 313 of hardware component 
20 would distribute that information to the Voice DSP 306. This packet 
transfer mechanism also works in the reverse, where the controller 313 of 
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hardware component 20 would give different information to the computer 10 
without having to switch into different modes. The packet protocol also 
allows commands to be sent to either the main controller 313 directly or to 
the Voice DSP 306 for controlling different options without having to enter a 
5 command state. 



identified by a beginning synchronization character (01 hex) followed by an 
ID/LI character and then followed by the information to be sent. In addition 
to the ID/LI character codes defined below, those skilled in the art will readily 
10 recognize that other ID/LI character codes could be defined to allow for 

additional types of packets such as video data, or alternate voice compression 
algorithm packets such as Codebook Excited Linear Predictive Coding (CELP) 
algorithm, GSM, RPE ? VSELP 3 etc. 



15 (VOICE, DATA, or QUALIFIED) is being sent. The transmitter tells the 
receiver to enter stream mode by a unique command. Thereafter, the 
transmitter tells the receiver to terminate stream mode by using the BREAK 
command followed by an "AT" type command. The command used to 
terminate the stream mode can be a command to enter another type of stream 

20 mode or it can be a command to enter back into packet mode. 



and QUALIFIED. Table 1 shows the common packet parameters used for all 
three packet types. Table 2 shows the three basic types of packets with the 
sub-types listed. 



Packet mode is made up of 8 bit asynchronous data and is 



Stream mode is used when large amounts of one type of packet 



Currently there are 3 types of packets used: DATA, VOICE, 



25 



TABLE 1: Packet Parameteis 



1 . Asynchronous transfer 

2. 8 bits, no parity 

3. Maximum packet length of 128 bytes 



- IDentifier byte = 1 



30 



- Information =127 
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4. SPEED 

- variable from 9600 to 57600 

- default to 19200 

5 TABLE 2: Packet Types 

1. Data 

2. Voice 

3. Qualified: 

10 a. COMMAND 

b. RESPONSE 

c. STATUS 

d. FLOW CONTROL 

e. BREAK 
15 f. ACK 

g. NAK 

h. STREAM 

A Data Packet is shown in Table 1 and is used for normal data 
transfer between the controller 313 of hardware component 20 and the 

20 computer 10 for such things as text, file transfers, binary data and any other 
type of information presently being sent through modems. All packet 
transfers begin with a synch character 01 hex (synchronization byte). The 
Data Packet begins with an ID byte which specifies the packet type and 
packet length. Table 3 describes the Data Packet byte structure and Table 4 

25 describes the bit structure of the ID byte of the Data Packet. Table 5 is an 
example of a Data Packet with a byte length of 6. The value of the LI field 
is the actual length of the data field to follow, not counting the ID byte. 
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TABLE 3: Data Packet Byte Stnicture 

byte 1 = Olh (sync byte) 

byte 2 = ID/LI (ID byte/length 

indicator) 

bytes 3-127 = data (depending on U) 



20 



25 



30 



01 ID ; ;* ; ! 

10 . SYNC LI data data i data : data ; data 



TABLE 4: ID Byte of Data Packet 

15 Bit 7 identifies the type of packet 

Bits 6-0 contain the LI or length 

indicator portion of the ID byte 



LI (Length Indicator) = 1 to 127 



TABLE 5: Data Packet Bxaxqple 

LI (length indicator) = 6 



01 06 



i 



SYNC ID \ data data '.. data data data data 



35 The Voice Packet is used to transfer compressed VOICE 

messages between the controller 313 of hardware component 20 and the 
computer 10. The Voice Packet is similar to the Data Packet except for its 
length which is, in the preferred embodiment, currently fixed at 23 bytes of 
data Once again, all packets begin with a synchronization character chosen 

40 in the preferred embodiment to be 01 hex (01H). The ID byte of the Voice 
Packet is completely a zero byte: all bits are set to zero. Table 6 shows the 
ID byte of the Voice Packet and Table 7 shows the Voice Packet byte 
structure. 
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TABLE 6: ID Byte of Voice Packet 



10 



LI (Length Indicator) = 0 



TABLE 7: Voice Packet Byte Structure 



LI (length indicator) = 0 
23 bytes of data 
15 (depending upon compression 

algorithm used) 



01 00 

20 SYNC ID data data data data ; data 



The Qualified Packet is used to transfer commands and other 
non-data/voice related information between the controller 313 of hardware 

25 component 20 and the computer 10. The various species or types of the 

Qualified Packets are described below and are listed above in Table 2. Once 
again, all packets start with a synchronization character chosen in the 
preferred embodiment to be 01 hex (01H). A Qualified Packet starts with two 
bytes where the first byte is the ID byte and the second byte is the 

30 QUALIFIER type identifier. Table 8 shows the ID byte for the Qualified 
Packet, Table 9 shows the byte structure of the Qualified Packet and Tables 
10-12 list the Qualifier Type byte bit maps for the three types of Qualified 
Packets. 

35 TABLE 8: XD Byte of Qualified Packet 

76543210 



40 1 LI (Length Indicator) = 1 to 127 



The Length Identifier of the ID byte equals the amount of data 
which follows including the QUALIFIER byte (QUAL byte + DATA). If LI 
45 - 1, then the Qualifier Packet contains the Q byte only. 
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TABLE 9: Qualifier Packet Byte Structure 



35 



01 85 QUAL 

SYNC ID BYTE . data ; data data •;> : data 



The bit maps of the Qualifier Byte (QUAL BYTE) of the 
Qualified Packet are shown in Tables 10-12. The bit map follows the pattern 
whereby if the QUAL byte = 0, then the command is a break. Also, bit 1 of 

10 the QUAL byte designates ack/nak, bit 2 designates flow control and bit 6 
designates stream mode command. Table 10 describes the Qualifier Byte of 
Qualified Packet, Group 1 which are immediate commands. Table 1 1 
describes the Qualifier Byte of Qualified Packet, Group 2 which are stream 
mode commands in that the command is to stay in the designated mode until 

15 a BREAK + INIT command string is sent. Table 12 describes the Qualifier 
Byte of Qualified Packet, Group 3 which are information or status commands. 

TABLE 10: Qualifier Byte of Qualified Packet: Group 1 

20 7 6 5 4 3 2 1 0 
xxxxxxxx 



00000000 = break 

00000010 = ACK 

25 00000011 = NAK 

00000100 = xof f or stop sending data 

00000101 = xon or resume sending data 
00001000 = cancel fax 

30 TABLE 11: Qualifier Byte of Qualified Packet: Group 2 



7 


6 


5 


4 


3 


2 


1 


0 


X 


X 


X 


X 


X 


X 


X 


X 


0 


1 


0 


0 


0 


0 


0 


1 


0 


1 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


1 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


0 


1 


1 


0 


0 


1 


0 


0 


0 


1 


1 


1 



= stream command mode 
= stream data 
= stream voice 
= stream video 
= stream A 

40 010 0 0110 = stream B 

= stream C 

The Qualifier Packet indicating stream mode and BREAK 
attention is used when a large of amount of information is sent (voice, data...) 
45 to allow the highest throughput possible. This command is mainly intended 
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for use in DATA mode but can be used in any one of the possible modes. To 
change from one mode to another, a break-init sequence would be given. A 
break "AT...<cr>" type command would cause a change in state and set the 
serial rate from the "AT" command. 
5 TABLE 12: Qualifier Byte of Qualified Packet: Group 3 

76543210 
xxxxxxxx 

10 10000000= commands 

10000001 = responses 
10000010 = status 

Cellular Supervisory Packet 
15 In order to determine the status of the cellular link, a 

supervisory packet shown in Table 13 is used. Both sides of the cellular link 
will send the cellular supervisory packet every 3 seconds. Upon receiving the 
cellular supervisory packet, the receiving side will acknowledge it using the 
ACK field of the cellular supervisory packet. If the sender does not receive 
20 an acknowledgement within one second, it will repeat sending the cellular 
supervisory packet up to 12 times. After 12 attempts of sending the cellular 
supervisory packet without an acknowledgement, the sender will disconnect 
the line. Upon receiving an acknowledgement, the sender will restart its 3 
second timer. Those skilled in the art will readily recognize that the timer 
25 values and wait times selected here may be varied without departing from the 
spirit or scope of the present invention. 

TABLE 13: Cellular Supervisory Packet Byte Structure 

30 

8F ID LI ACK data data data 

Speech Compression 

The Speech Compression algorithm described above for use in 

35 transmitting voice over data accomplished via the voice control circuit 306. 

Referring once again to Figure 2, the user is talking either through the 

handset, the headset or the microphone/speaker telephone interface. The 

analog voice signals are received and digitized by the telephone CODEC 

circuit 305. The digitized voice information is passed from the digital 



BN80OCtD: «WO_061746M1JU» 



WO 96/17465 




PCT/US95/14826 



telephone CODEC circuit 305 to the voice control circuits 306. The digital 
signal processor (DSP) of the voice control circuit 306 is programmed to do 
the voice compression algorithm. The source code programmed into the voice 
control DSP is in the microfiche appendix of U.S. Patent Application Serial 
5 No. 08/002,467, filed January 8, 1993 entitled "COMPUTER-BASED 
MULTIFUNCTION PERSONAL COMMUNICATIONS SYSTEM'. The 
DSP of the voice control circuit 306 compresses the speech and places the 
compressed digital representations of the speech into special packets described 
more fully below. As a result of the voice compression algorithm, the 
10 compressed voice information is passed to the dual port ram circuit 308 for 
either forwarding and storage on the disk of the personal computer via the 
RS232 serial interface or for multiplexing with conventional modem data to 
be transmitted over the telephone line via the telephone line interface circuit 
309 in the voice-over-data mode of operation Show and Tell function 123. 
1 5 The compressed speech bits are multiplexed with data bits using a packet 
format described below. Three compression rates are described herein which 
will be called 8Kbit/sec, 9.6Kbit/sec and 16Kbit/sec. 

Spfrah Compression Algorithm 
To multiplex high-fidelity speech with digital data and transmit 
20 both over the telephone line, a high available bandwidth would normally be 
required. In the present invention, the analog voice information is digitized 
into 8-bit PCM data at an 8 KHz sampling rate producing a serial bit stream 
of 64,000 bps serial data rate. This rate cannot be transmitted over the 
telephone line. With the Speech Compression algorithm described below, the 
25 64 Kbs digital voice data is compressed into a 9500 bps encoding bit stream 
using a fixed-point (non-floating point) DSP such that the compressed speech 
can be transmitted over the telephone line multiplexed with asynchronous 
data This is accomplished in an efficient manner such that enough machine 
cycles remain during real time speech compression to allow to allow for echo 
30 cancellation in the same fixed-point DSP. 

A silence detection function is used to detect quiet intervals in 
the speech signal which allows the data processor to substitute asynchronous 
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data in lieu of voice data packets over the telephone line to efficiently time 
multiplex the voice and asynchronous data transmission. The allocation of 
time for asynchronous data transmission is constantly changing depending on 
how much silence is on the voice channel. 
5 The voice compression algorithm of the present system relies 

on a model of human speech which shows that human speech contains 
redundancy inherent in the voice patterns. Only the incremental innovations 
(changes) need to be transmitted. The algorithm operates on 128 digitized 
speech samples (20 milliseconds at 6400 Hz), divides the speech samples into 

10 time segments of 32 samples (5 milliseconds) each, and uses predicted coding 
on each segment. Thus, the input to the algorithm could be either PCM data 
sampled at 6400 Hz or 8000 Hz. If the sampling is at 8000 Hz, or any other 
selected sampling rate, the input sample data stream must be decimated from 
8000 Hz to 6400 Hz before processing the speech data. At the output, the 

15 6400 Hz PCM signal is interpolated back to 8000 Hz and passed to the 
CODEC. 

With this algorithm, the current segment is predicted as best as 
possible based on the past recreated segments and a difference signal is 
determined. The difference values are compared to the stored difference 

20 values in a lookup table or code book, and the address of the closest value is 
sent to the remote site along with the predicted gain and pitch values for each 
segment. In this fashion, the entire 20 milliseconds of speech can be 
represented by 190 bits, thus achieving an effective data rate of 9500 bps. 

To produce this compression, the present system includes a 

25 unique Vector Quantization (VQ) speech compression algorithm designed to 
provide maximum fidelity with minimum compute power and bandwidth. The 
VQ algorithm has two major components. The first section reduces the 
dynamic range of the input speech signal by removing short term and long 
term redundancies. This reduction is done in the waveform domain, with the 

30 synthesized part used as the reference for determining the incremental "new" 
content The second section maps the residual signal into a code book 
optimized for preserving the general spectral shape of the speech signal. 
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Figure 3 is a high level signal flow block diagram of the speech 
compression algorithm used in the present system to compress the digitized 
voice for transmission over the telephone line in the voice over data mode of 
operation or for storage and use on the personal computer. The transmitter 
5 and receiver components are implemented using the programmable voice 
control DSP/CODEC circuit 306 shown in Figure 2. 

The DC removal stage 1 101 receives the digitized speech signal 
and removes the D.C. bias by calculating the long-term average and 
subtracting it from each sample. This ensures that the digital samples of the 
10 speech are centered about a zero mean value. The pre-emphasis stage 1 103 
whitens the spectral content of the speech signal by balancing the extra energy 
in the low band with the reduced energy in the high band. 

The system finds the innovation in the current speech segment 
by subtracting 1 109 the prediction from reconstructed past samples 
15 synthesized from synthesis stage 1 107. This process requires the synthesis of 
the past speech samples locally (analysis by synthesis). The synthesis block 
1 107 at the transmitter performs the same function as the synthesis block 
1 1 13 at the receiver. When the reconstructed previous segment of speech is 
subtracted from the present segment (before prediction), a difference term is 
20 produced in the form of an error signal. This residual error is used to find the 
best match in the code book 1 105. The code book 1 105 quantizes the error 
signal using a code book generated from a representative set of speakers and 
environments. A minimum mean squared error match is determined in 
segments. In addition, the code book is designed to provide a quantization 
25 error with spectral rolloff (higher quantization error for low frequencies and 
lower quantization error for higier frequencies). Thus, the quantization noise 
spectrum in the reconstructed signal will always tend to be smaller than the 
underlying speech signal. 

The following description will specifically explain the algorithm 
30 for the 9.6Kbit/sec compression rate, except where specifically stated 

otherwise. The discussion is applicable to the other compression rates by 
substituting the parameter values found in Table 14, below, and by following 
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the special instructions for each calculation provided throughout the 
discussion. 
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9.6Kbit/sec Compression Rate Algorithm 
For the 9.6Kbit/sec speech compression rate, each frame of 
20ms is divided into 4 sub-blocks or segments of 5ms each. Each sub-block 
of data consists of a plurality of bits for the long term predictor, a plurality of 
5 bits for the long term predictor gain, a plurality of bits for the sub-block gain, 
and a plurality of bits for each code book entry for each 5ms. In the code 
book block, each 1.25ms of speech is looked up in a 512 word code book for 
the best match. The table entry is transmitted rather than the actual samples. 
The code book entries are pre-computed from representative speech segments, 

10 as described more fully below. 

On the receiving end 1200, the synthesis block 1 1 13 at the 
receiver performs the same function as the synthesis block 1 107 at the 
transmitter. The synthesis block 1113 reconstructs the original signal from the 
voice data packets by using the gain and pitch values and code book address 

15 corresponding to the error signal most closely matched in the code book. The 
code book at the receiver is similar to the code book 1 105 in the transmitter. 
Thus the synthesis block recreates the original pre-emphasized signal. The 
de-emphasis stage 1115 inverts the pre-emphasis operation by restoring the 
balance of original speech signal. 

20 The complete speech compression algorithm is summarized as 

follows: 

a) Digitally sample the voice to produce a PCM sample bit 
stream sampled at 8,000 samples per second. 

25 b) Decimate the 8,000 samples per second sampled data to 

produce a sampling rate of 6,400 samples per second for 
the 9.6Kbit/sec compression rate (6,000 samples per 
second for the 8Kbit/sec algorithm and 8,000 samples 
per second for the 16Kbit/sec algorithm). 

30 

c) Remove any D.C. bias in the speech signal. 

d) Pre-emphasize the signal. 

35 e) Find the innovation in the current speech segment by 

subtracting the prediction from reconstructed past 
samples.. This step requires the synthesis of the past 
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speech samples locally (analysis by synthesis) such that 
the residual error is fed back into the system. 

f) Quantize the error signal using a code book generated 
5 from a representative set of speakers and environments. 

A minimum mean squared error match is determined in 
5ms segments. In addition, the code book is designed to 
provide a quantization error with spectral rolloff (higher 
quantization error for low frequencies and lower 
10 quantization error for higher frequencies). Thus, the 

quantization noise spectrum in the reconstructed signal 
will always tend to be smaller than the underlying 
speech signal. 

15 g) At the transmitter and the receiver, reconstruct the 

speech from the quantized error signal fed into the 
inverse of the function in step (e) above. Use this 
signal for analysis by synthesis and for the output to the 
reconstruction stage below. 

20 

h) Use a de-emphasis filter to reconstruct the output. 

The major advantages of this approach over other low-bit-rate 
algorithms are that there is no need for any complicated calculation of 
25 reflection coefficients (no matrix inverse or lattice filter computations). Also, 
the quantization noise in the output speech is hidden under the speech signal 
and there are no pitch tracking artifacts: the speech sounds "natural", with 
only minor increases of background hiss at lower bit-rates. The computational 
load is reduced significantly compared to a VSELP algorithm and variations 
30 of the present algorithm thus provides bit rates of 8, 9.6 and 1 6 Kbit/sec, and 
can also provide bit rates of 9.2Kbit/sec, 9.5Kbit/sec and many other rates. 
The total delay through the analysis section is less than 20 milliseconds in the 
9.6Kbit/sec embodiment. The present algorithm is accomplished completely 
in the waveform domain and there is no spectral information being computed 
35 and there is no filter computations needed. 

Pftail f^ Descriptio n nf the Speyyh rnmpression Algorithm 

The speech compression algorithm is described in greater detail 
with reference to Figures 4 through 7, and with reference to the block diagram 
of the hardware components of the present system shown at Figure 2. The 
40 voice compression algorithm operates within the programmed control of the 
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voice control DSP circuit 306. In operation, the speech or analog voice signal 
is received through the telephone interface 301, 302 or 303 and is digitized by 
the digital telephone CODEC circuit 305. The CODEC for circuit 305 is a 
companding |j-law CODEC. The analog voice signal from the telephone 
5 interface is band-limited to about 3,000 Hz and sampled at a selected 
sampling rate by digital telephone CODEC 305. The sample rates in the 
9.6Kbit/sec embodiment of the present invention are 8Ksample/sec. Each 
sample is encoded into 8-bit PCM data producing a serial 64kb/s. The 
digitized samples are passed to the voice control DSP/CODEC of circuit 306. 
10 There, the 8-bit n-law PCM data is converted to 13-bit linear PCM ^ta The 
13-bit representation is necessary to accurately represent the linear version of 
the logarithmic 8-bit fi-law PCM data. With linear PCM data simpler 
mathematics may be performed on the PCM data 



15 single integrated circuit U8 shown in Figures 9A and 9B as a WE? DSP16C 
Digital Signal Processor/CODEC from AT&T Microelectronics which is a 
combined digital signal processor and a linear CODEC in a single chip as 
described above. The digital telephone CODEC of circuit 305 corresponds to 
integrated circuit U12 shown in Figure 9B as a T7540 companding ja-lau 

20 CODEC. 



telephone jx-law CODEC 305 shown in Figure 2 are passed to the voice 
control DSP/CODEC circuit 308 via direct data lines clocked and 
synchronized to a clocking frequency. The sample rate in CODEC 305 in this 

25 embodiment of the present invention is 8Ksample/sec. The digital samples 
are loaded into the voice control DSP/CODEC one at a time through the serial 
input and stored into an internal queue held in RAM, converted to linear PCM 
data and decimated to a sample rate of 6.4Ksample/sec. As the samples are 
loaded into the end of the queue in the RAM of the voice control DSP, the 

30 samples at the head of the queue are operated upon by the voice compression 
algorithm. The voice compression algorithm then produces a greatly 
compressed representation of the speech signals in a digital packet form. The 



The voice control DSP/CODEC of circuit 306 correspond to the 



The sampled and digitized PCM voice signals from the 
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compressed speech signal packets are then passed to the dual port RAM 
circuit 308 shown in Figure 2 for use by the main controller circuit 313 for 
either transferring in the voice-over-data mode of operation or for transfer to 
the personal computer for storage as compressed voice for functions such as 

5 telephone answering machine message data, for use in the multi-media 
documents and the like. 

In the voice-over-data mode of operation, voice control 
DSP/CODEC circuit 306 of Figure 2 will be receiving digital voice PCM data 
from the digital telephone CODEC circuit 305, compressing it and transferring 

10 it to dual port RAM circuit 308 for multiplexing and transfer over the 
telephone line. This is the transmit mode of operation of the voice control 
DSP/CODEC circuit 306 corresponding to transmitter block 1 100 of Figure 3 
and corresponding to the compression algorithm of Figure 4. 

Concurrent with this transmit operation, the voice control 

15 DSP/CODEC circuit 306 is receiving compressed voice data packets from 
dual port RAM circuit 308, uncompressing the voice data and transferring the 
uncompressed and reconstructed digital PCM voice data to the digital 
telephone CODEC 305 for digital to analog conversion and eventual transfer 
to the user through the telephone interface 301, 302, 304. This is the receive 

20 mode of operation of the voice control DSP/CODEC circuit 306 

corresponding to receiver block 1200 of Figure 3 and corresponding to the 
decompression algorithm of Figure 5. Thus, the voice-control DSP/CODEC 
circuit 306 is processing the voice data in both directions in a full-duplex 
fashion. 

25 The voice control DSP/CODEC circuit 306 operates at a clock 

frequency of approximately 24.576MHz while processing data at sampling 
rates of approximately 8KHz in both directions. The voice 
compression/decompression algorithms and packetization of the voice data is 
accomplished in a quick and efficient fashion to ensure that all processing is 

30 done in real-time without loss of voice inforrnatioa This is accomplished in 
an efficient manner such that enough machine cycles remain in the voice 
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control DSP circuit 306 during real time speech compression to allow real 
time acoustic and line echo cancellation in the same fixed-point DSP. 

In programmed operation, the availability of an eight-bit sample 
of PCM voice data from the ji-law digital telephone CODEC circuit 305 
5 causes an interrupt in the voice control DSP/CODEC circuit 306 where the 
sample is loaded into internal registers for processing. Once loaded into an 
internal register it is transferred to a RAM address which holds a queue of 
samples. The queued PCM digital voice samples are converted from 8-bit ji- 
law data to a 13-bit linear data format using table lookup for the conversion. 

10 Those skilled in the art will readily recognize that the digital telephone 
CODEC circuit 305 could also be a linear CODEC. 

Sample Rate Decimation 
The sampled and digitized PCM voice signals from the 
telephone ja-law CODEC 305 shown in Figure 2 are passed to the voice 

15 control DSP/CODEC circuit 308 via direct data lines clocked and 

synchronized to a clocking frequency. The sample rate in this embodiment of 
the present invention is 8Ksample/sec. The digital samples for the 9.6Kbit/sec 
and 8Kbit/sec algorithms are decimated using a digital decimation process to 
produce a 6.4Ksample/sec and 6Ksample/sec rate, respectively. For the 

20 16Kbit/sec algorithm, no decimation is needed. 

Referring to Figure 3, the decimated digital samples are shown 
as speech entering the transmitter block 1 100. The transmitter block, of 
course, is the mode of operation of the voice-control DSP/CODEC circuit 306 
operating to receive local digitized voice information, compress it and 

25 packetize it for transfer to the main controller circuit 313 for transmission on 
the telephone line. The telephone line connected to telephone line interface 
309 of Figure 2 corresponds to the channel 1111 of Figure 3. 

A frame rate for the voice compression algorithm is 20 
milliseconds of speech for each compression. This correlates to 128 samples 

30 to process per frame for the 6.4K decimated sampling rate. When 128 
samples are accumulated in the queue of the internal DSP RAM, the 
compression of that sample frame is begun. 
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Data Flow Description 
The voice-control DSP/CODEC circuit 306 is programmed to 
first remove the DC component 1 101 of the incoming speech. The DC 
removal is an adaptive function to establish a center base line on the voice 
5 signal by digitally adjusting the values of the PCM data This corresponds to 
the DC removal stage 1203 of the software flow chart of Figure 4. The 
formula for removal of the DC bias or drift is as follows: 

32735 

10 x(n) = s(n) - s(n-l) + a * x (n-1) where a = 



32768 



and where n = sample number, 

s(n) is the current sample, and 
1 5 x(n) is the sample with the DC bias removed 

The removal of the DC is for the 20 millisecond frame of voice 
which amounts to 128 samples at the 6.4Ksample/sec decimated sampling rate 
which corresponds to the 9.6Kbit/sec algorithm. The selection of a is based 

20 on empirical observation to provide the best result. 

Referring again to Figure 4, the voice compression algorithm in 
a control flow diagram is shown which will assist in the understanding of the 
block diagram of Figure 3. Figure 6 is a simplified data flow description of 
the flow chart of Figure 4 showing the sample rate decimator 1241 and the 

25 sample rate incrementor 1242. Sample rate decimator 1241 produces an 
output 1251 of 6.4Ksample/sec for an 8Ksample/sec input in the 9.6Kbit/sec 
system. (Similarly, a 6Ksample/sec output 1250 is produced for the 8Kbit/sec 
algorithm, and no decimation is performed on the 8Ksample/sec voice sample 
rate 1252 for the 16Kbit/sec algorithm.) The analysis and compression begin 

30 at block 1201 where the 13-bit linear PCM speech samples are accumulated 
until 128 samples (for the 6.4Ksample/sec decimated sampling rate) 
representing 20 milliseconds of voice or one frame of voice is passed to the 
DC removal portion of code operating within the programmed voice control 
DSP/CODEC circuit 306. The DC removal portion of the code described 
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above approximates the base line of the frame of voice by using an adaptive 
DC removal technique. 

A silence detection algorithm 1205 is also included in the 
programmed code of the DSP/CODEC 306. The silence detection function is 
5 a summation of the square of each sample of the voice signal over the frame. 
If the power of the voice frame falls below a preselected threshold, this would 
indicate a silent frame. The detection of a silence frame of speech is 
important for later multiplexing of the V-data (voice data) and C-data 
(asynchronous computer data) described below. During silent portions of the 
10 speech, the main controller circuit 313 will transfer conventional digital data 
(C-data) over the telephone line in lieu of voice data (V-data). The formula 
for computing the power is 

Sub-Block Size-1 
15 PWR= 5Z x(n)*x(n) 

n = 0 

where n is the sample number, and 
20 x (n) is the sample value 

If the power PWR is lower than a preselected threshold then 
the present voice frame is flagged as containing silence. The 128-sample 
(Decimated Samples) silent frame is still processed by the voice compression 

25 algorithm; however, the silent frame packets are discarded by the main 

controller circuit 313 so that asynchronous digital data may be transferred in 
lieu of voice data. The rest of the voice compression is operated upon in 
segments where there are four segments per frame amounting to 32 samples 
of data per segment (Sub-Block Size). It is only the DC removal and silence 

30 detection which is accomplished over an entire 20 millisecond frame. 

The pre-emphasis 1207 of the voice compression algorithm 
shown in Figure 4 is the next step. The sub-blocks are first passed through a 
pre-emphasis stage which whitens the spectral content of the speech signal by 
balancing the extra energy in the low band with the reduced energy in the 

35 high band. The pre-emphasis essentially flattens the signal by reducing the 
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dynamic range of the signal. By using pre-emphasis to flatten the dynamic 
range of the signal, less of a signal range is required for compression making 
the compression algorithm operate more efficiently. The formula for the pre- 
emphasis is 

5 x ( n ) = x (n) - p *x(n-l) where p = 0.5 for9.6KWt/sec 

and where n is the sample number, 
jc (n) is the sample 

Each segment thus amounts to five milliseconds of voice which 
is equal to 32 samples. Pre-emphasis then is done on each segment. The 
10 selection of p is based on empirical observation to provide the best result. 

The next step is the long-term prediction (LTP). The long-term 
prediction is a method to detect the innovation in the voice signal. Since the 
voice signal contains many redundant voice segments, we can detect these 
redundancies and only send information about the changes in the signal from 
1 5 one segment to the next. This is accomplished by comparing the speech 
samples of the current segment on a sample by sample basis to the 
reconstructed speech samples from the previous segments to obtain the 
innovation information and an indicator of the error in the prediction. 

The long-term predictor gives the pitch and the LTP-Gain of 
20 the sub-block which are encoded in the transmitted bit stream. In order to 
predict the pitch in the current segment, we need at least 3 past sunblocks of 
reconstructed speech. This gives a pitch value in the range of MIN_PITCH to 
MAXPITCH (32 and 95, respectively, as given in Table 14). This value is 
coded with 6-bits. But, in order to accommodate the compressed data rate 
25 within a 9600 bps link, the pitch for segments 0 and 3 is encoded with 6 bits, 
while the pitch for segments 1 and 2 is encoded with 5 bits. When 
performing the prediction of the Pitch for segments 1 and 2, the correlation 
lag is adjusted around the predicted pitch value of the previous segment. This 
gives us a good chance of predicting the correct pitch for the current segment 
30 even though the entire range for prediction is not used. The computations for 
the long-term correlation lag PITCH and associated LTP-gain factor (3 j 
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(where j = 0, 1, 2, 3 corresponding to each of the four segments of the frame) 
are done as follows: 

For j — minjiteh .... maxjitch, first perform the following 
computations between the current speech samples x(n) and the past 
5 reconstructed speech samples xXn) 

Sub-Block Size - I 

0) = (i) * x' (i +MAX_PITCH-j) 



10 



Sub-Block Size - 1 

S xV (j) = Xx' (i + MAXJ>ITCH-j) * x' PITCH-j) 

i=0 



2 

15 The Pitch j is chosen as that which maximizes "g — . Since (3 j 

is positive, only j with positive S^ 2 is considered. 

For the 9.6Kbit/sec and 8Kbit/sec embodiments, the Pitch is 

encoded with different number of bits for each sub-segment, the value of 

20 min_pitch and max__piteh (range of the synthesized speech for pitch prediction 

of the current segment) is computed as follows: 

if (seg_number = 0 or 3) 

{ 

min_pitch = MINPITCH 
25 max_pitch - MAX__PITCH 

} 

if (seg_number = 1 or 2) 

{ 

min_pitch = prev_pitch - 1 5 
30 if (prev_pitch < MIN PITCH + 15) 

min_pitch = MINPITCH 
if (prev_pitch > MAX_PITCH + 15) 
min_pitch - MAX_PITCH - 30 
maxjjitch = min_pitch + 30 
35 } 
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(This calculation is not necessary for the 16Kbit/sec algorithm.) 
The prev_pitch parameter in the above equation, is the of the pitch of the 
previous sub-segment. The pitch j is the encoded in 6 bits or 5 bits as: 

encoded bits = j - min_pitch 
5 The LTP-Gain is given by 



S„ 0) 

10 The value of the P is a normalized quantity between zero and 

unity for this segment where (3 is an indicator of the correlation between the 
segments. For example, a perfect sine wave would produce a p which would 
be close to unity since the correlation between the current segments and the 
previous reconstructed segments should be almost a perfect match so p is one. 

15 The LTP gain factor is quantized from a LTP Gain Encode Table. This table 
is characterized in Table 15. The resulting index (bcode) is transmitted to the 
far end. At the receiver, the LTP Gain Factor is retrieved from Table 16, as 
follows: 

P, = dlb_tab[bcode] 
20 TABLE 15: LTP Gain Encode Table 



25 



< 



p= 0.1 0.3 0.5 0.7 0.9 

I I I I I I I I I I > 



bcode= 0 1 2 3 4 5 

TABLE 16: LTP Gain Decode Table 

p= 0.0 0.2 0.4 0.5 0.8 1.0 

K | | | | | | | | I I"— > 

30 bcode=0 1 2 3 4 5 

After the Long-Term Prediction, we pass the signal through a 
pitch filter to whiten the signal so that all the pitch effects are removed. The 
pitch filter is given by: 

35 
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where j is the Lag, and 

is the associated Quantized LTP Gain. 
Next, the error signal is normalized with respect to the 
5 maximum amplitude in the sub-segment for vector-quantization of the error 
signal. The maximum amplitude in the segment is obtained as follows: 
G = MAX{|efa)|} 

The maximum amplitude (G) is encoded using the Gain Encode 
Table. This table is characterized in Table 17. The encoded amplitude 
10 (gcode) is transmitted to the far end. At the receiver, the maximum amplitude 
is retrieved from Table 18, as follows: 
G q = dig Jab [gcode] 

The error signal e(n) is then normalized by 
e(n) 

15 e(n) = 

G, 

TABLE 17 : Gain Encode Table 

G=16 32 64 128 256 512 1024 2048 4096 8192 

<-l I— I I — I I — I I— I l-H I— I l~l I — > 

20 0123456789 
(gcode) 

TABLE 18: Gain Decode Table 

G=16 32 64 128 256 512 1024 2048 4096 8192 

25 | | | - | I— | | | | - | |- Mill I - I I I 
0123456789 

(gcode) 

From the Gain and LTP Gain Encode tables, we can see that 
30 we would require 4 bits for gcode and 3 bits for bcode. This results in total 
of 7 bits for both parameters. In order to reduce the bandwidth of the 
compressed bit stream, the gcode and bcode parameters are encoded together 
in 6 bits, as follows: 

BGCODE = 6 * gcode + bcode. 

35 
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The encoded bits for the G and LTP-Gain (|3) at the receiver 
can be obtained as follows: 

gcode = BGCODE/6 
bcode = BGCODE - 6 * gcode 
5 However, these calculations are needed only for the 8Kbit/sec and 9.6Kbit/sec 
algorithms. 

Each segment of 32 samples (Sub-Block Size) is divided into 4 
vectors of 8 samples (VSIZE) each. Each vector is compared to the vectors 
stored in the CodeBook and the Index of the Code Vector that is closest to the 
10 signal vector is selected. The CodeBook consists of 512 entries (512 
addresses). The index chosen has the least difference according to the 
following minimization formula: 

VSIZE - I 

Mm{L(x t - yi f} 

1 5 where 'x, = the input vector of VSIZE samples (8 for the 

9.6Kbit/sec algorithm), and 

y, = the code book vector of VSIZE samples (8 for the 
9.6Kbit/sec algorithm). 
20 The minimization computation, to find the best match between 

the subsegment and the code book entries is computationally intensive. A 
brute force comparison may exceed the available machine cycles if real time 
processing is to be accomplished. Thus, some shorthand processing 
approaches are taken to reduce the computations required to find the best fit. 
25 The above formula can be computed in a shorthand fashion as follows. 

By expanding out the above formula, some of the unnecessary 
terms may be removed and some fixed terms may be pre-computed: 
(x i -y i ) 2 = (x i -y i )*(x i -y i ) 

= (Xj 2 - xm - x^ + yj 2 ) 
30 = (x 2 - 2Xiyi + yi 2 ) 

where x 2 is a constant so it may be dropped from the formula, 
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and the value of -Vi may be precompiled and stored as the VSIZE + 1th 

value (8 + 1 = 9th value for the 9.6 Kbit/sec algorithm) in the code book so 
that the only real-time computation involved is the following formula: 

5 Minl^^y,)} 

Thus, for a segment of Sub-Block Size samples (32 for the 
9.6Kbit/sec algorithm), we will transmit Sub-Block Size/VSIZE CodeBook 
10 indices (4 CodeBook Indices, 9 bits each, for the 9.6Kbit/sec algorithm). 

Therefore, for the 9.6Kbit/sec algorithm, for each Sub-Block Size segment we 
will transmit 36 bits representing that segment. 

After the appropriate index into the code book is chosen, the 
input speech samples are replaced by the corresponding vectors in the chosen 
15 indexes. These values are then multiplied by the Q, to denormalize the 

synthesized error signal, e'(n). This signal is then passed through the Inverse 
Pitch Filter to reintroduce the Pitch effects that was taken out by the Pitch 
filter. The Inverse Pitch Filter is performed as follows: 
y(n) =*W + P q *x'fn-J5> 
20 where p q is the Quantized LTP-Gain from Table 16, and j is the Lag. 

The Inverse Pitch Filter output is used to update the synthesized 
speech buffer which is used for the analysis of the next sub-segment. The 
update of the state buffer is as follows: 
x' (k) =*' (k + MTNJ>ITCH) 
25 where k = 0, ... , (MAX_PITCH - MIN PITCH) - 1 

x'(l) =y(n) 

where / = MAXPITCH - MTNPITCH, MAXPITCH - 1 
The signal is then passed through the deemphasis filter since 
preemphasis was performed at the beginning of the processing. In the 
30 analysis, only the preemphasis state is updated so that we properly satisfy the 
Analysis-by-Synthesis method of performing the compression. In the 
Synthesis, the output of the deemphasis filter, s' (n), is passed on to the D/A 
to generate analog speech. The deemphasis filter is implemented as follows: 
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j'foj =y fa> + P * s ' ( n ' V P = 0 5 for ^ 9.6Kbit/sec 

algorithm 

The voice is reconstructed at the receiving end of the voice- 
over data link according to the reverse of the compression algorithm as shown 

5 as the decompression algorithm in Figure 5. 

If a silence frame is received, the decompression algorithm 
simply discards the received frame and initialize the output with zeros. If a 
speech frame is received, the pitch, LTP-Gain and GAIN are decoded as 
explained above. The error signal is reconstructed from the codebook 

10 indexes, which is then denormalized with respect to the GAIN value. This 
signal is then passed through the Inverse filter to generate the reconstructed 
signal. The Pitch and the LTP-Gain are the decoded values, same as those 
used in the Analysis. The filtered signal is passed through the Deemphasis 
filter whose output is passed on to the D/A to put out analog speech. 

15 The compressed frame contains 23 8-bit words and one 6-bit 

word. Thus a total of 24 words. Total number of bits transferred is 190, 
which corresponds to 9500 bps as shown in Table 19 (for the 9.6Kbit/sec 
algorithm). 
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Tahle 19 Compressed Frame Racket for 9.6Khit/sec Algorithm 
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where BG = Beta/Gain, P = Pitch, VQ = CodeBook Index and S = Spare 
20 Bits 



Code Book Descriptions 
Table 20 describes the format of the code book for the 9.6Kbit/sec 
algorithm. The code book values are stored in a signed floating point format 
25 which is converted to a fixed point representation of floating point number 
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when stored in the lookup tables of the present invention. There are 512 
entries in each code book corresponding to 512 different speech segments 
which can be used to encode and reconstruct the speech. 

Tahle 20: Porte Book F nimat for the 9.6Kbit/sec Algorithm 

5 

Code Book Entries — '/ 2 Sum 2 Constant — 

8 entries 1 entry 

For the 9.6Kbit/sec algorithm, the code book comprises a table 
of nine columns and 512 rows of floating point data. The first 8 rows 
10 correspond to the 8 samples of speech and the ninth entry is the precomputed 
constant described above as -V 2 X y*. An example of the code book data is 
shown in Table 21 with the complete code book for the 9.6Kbit/sec algorithm. 
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The code books are stored in PROM memory accessible by the 
Voice DSP as a lookup table. The table data is loaded into local DSP 

5 memory upon the selection of the appropriate algorithm to increase access 
speed. The code books comprise a table of data in which each entry is a 
sequential address from 000 to 51 1. For the 9.6Kbit/sec algorithm, a 9 X 512 
code book is used. For the 16Kbit/sec algorithm, a 6 X 256 code book is 
used and for the 8Kbit/sec algorithm, a 9 X 512 code book is used. 

10 Depending upon which voice compression quality and compression rate is 
selected, the corresponding code book is used to encode/decode the speech 
samples. 

The code books are generated statistically by encoding a wide 
15 variety of speech patterns. The code books are generated in a learning mode 
for the above-described algorithm in which each speech segment which the 
compression algorithm is first exposed to is placed in the code book until 512 
entries are recorded. Then the algorithm is continually fed a variety of speech 
patterns upon which the code book is adjusted. As new speech segments are 
20 encountered, the code book is searched to find the best match. If the error 
between the observed speech segment and the code book values exceed a 
predetermined threshold, then the closest speech segment in the code book 
and the new speech segment is averaged and the new average is placed in the 
code book in place of the closest match. In this learning mode, the code book 
25 is continually adjusted to have the lowest difference ratio between observed 
speech segment values and code book values. The learning mode of operation 
may take hours or days of exposure to different speech patterns to adjust the 
code books to the best fit. 

The code books may be exposed to a single person's speech 
30 which will result in a code book being tailored to that particular persons 
method of speaking. For a mass market sale of this product, the speech 
patterns of a wide variety of speakers of both genders are exposed to the code 
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book learning algorithm for the average fit for a given language. For other 
languages, it is best to expose the algorithm to speech patterns of only one 
language such as English or Japanese. 

Voice Over Data Packet Protocol 
5 As described above, the present system can transmit voice data 

and conventional data concurrently by using time multiplex technology. The 
digitized voice data, called V-data carries the speech information. The 
conventional data is referred to as C-data. The V-data and C-data multiplex 
transmission is achieved in two modes at two levels: the transmit and receive 
10 modes and data service level and multiplex control level. This operation is 
shown diagrammatically in Figure 7. 

In transmit mode, the main controller circuit 313 of Figure 2 
operates in the data service level 1505 to collect and buffer data from both the 
personal computer 10 (through the RS232 port interface 315) and the voice 
15 control DSP 306. In multiplex control level 1515, the main controller circuit 
313 multiplexes the data and transmits that data out over the phone line 1523. 
In the receive mode, the main controller circuit 313 operates in the multiplex 
control level 1515 to de-multiplex the V-data packets and the C-data packets 
and then operates in the data service level 1505 to deliver the appropriate data 
20 packets to the correct destination: the personal computer 10 for the C-data 
packets or the voice control DSP circuit 306 for V-data. 

Transmit Mode 

In transmit mode, there are two data buffers, the V-data buffer 
1511 and the C-data buffer 1513, implemented in the main controller RAM 
25 316 and maintained by main controller 313. When the voice control DSP 
circuit 306 engages voice operation, it will send a block of V-data every 20 
ms to the main controller circuit 313 through dual port RAM circuit 308. 
Each V-data block has one sign byte as a header and 24 bytes of V-data 
The sign byte header of the voice packet is transferred every 
30 frame from the voice control DSP to the controller 313. The sign byte header 
contains the sign byte which identifies the contents of the voice packet. The 
sign byte is defined as follows: 
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00 hex = the following V-data contains silent sound 

01 hex = the following V-data contains speech information 

If the main controller 313 is in transmit mode for V-data/C-data 
multiplexing, the main controller circuit 313 operates at the data service level 

5 to perform the following tests. When the voice control DSP circuit 306 starts 
to send the 24-byte V-data packet through the dual port RAM to the main 
controller circuit 313, the main controller will check the V-data buffer to see 
if the buffer has room for 24 bytes. If there is sufficient room in the V-data 
buffer, the main controller will check the sign byte in the header preceding 

10 the V-data packet. If the sign byte is equal to one (indicating voice 
information in the packet), the main controller circuit 313 will put the 
following 24 bytes of V-data into the V-data buffer and clear the silence 
counter to zero. Thai the main controller 313 sets a flag to request that the 
V-data be sent by the main controller at the multiplex control level. 

15 If the sign byte is equal to zero (indicating silence in the V- 

data packet), the main controller circuit 313 will increase the silence counter 
by 1 and check if the silence counter has reached 5. When the silence 
counter reaches 5, the main controller circuit 313 will not put the following 
24 bytes of V-data into the V-data buffer and will stop increasing the silence 

20 counter. By this method, the main controller circuit 313 operating at the 
service level will only provide non-silence V-data to the multiplex control 
level, while discarding silence V-data packets and preventing the V-data 
buffer from being overwritten. 

The operation of the main controller circuit 313 in the 

25 multiplex control level is to multiplex the V-data and C-data packets and 
transmit them through the same channel. At this control level, both types of 
data packets are transmitted by the HDLC protocol in which data is 
transmitted in synchronous mode and checked by CRC error checking. If a 
V-data packet is received at the remote end with a bad CRC, it is discarded 

30 since 100% accuracy of the voice channel is not ensured. If the V-data 
packets were re-sent in the event of corruption, the real-time quality of the 
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voice transmission would be lost. In addition, the C-data is transmitted 
following a modem data communication protocol such as CCITT V.42. 

In order to identify the V-data block to assist the main 
controller circuit 313 to multiplex the packets for transmission at his level, 
5 and to assist the remote site in recognizing and de-multiplexing the data 
packets, a V-data block is defined which includes a maximum of five V-data 
packets. The V-data block size and the maximum number of blocks are 
defined as follows: 

The V-data block header = 80h; 
10 The V-data block size = 24; 

The maximum V-data block size = 5; 
The V-data block has higher priority to be transmitted than C- 
data to ensure the integrity of the real-time voice transmission. Therefore, the 
main controller circuit 313 will check the V-data buffer first to determine 
1 5 whether it will transmit V-data or C-data blocks. If V-data buffer has V-data 
of more than 69 bytes, a transmit block counter is set to 5 and the main 
controller circuit 313 starts to transmit V-data from the V-data buffer through 
the data pump circuit 31 1 onto the telephone line. Since the transmit block 
counter indicates 5 blocks of V-data will be transmitted in a continuous 
20 stream, the transmission will stop either at finish the 1 15 bytes of V-data or if 
the V-data buffer is empty. If V-data buffer has V-data with number more 
than 24 bytes, the transmit block counter is set 1 and starts transmit V-data 
This means that the main controller circuit will only transmit one block of V- 
data. If the V-data buffer has V-data with less than 24 bytes, the main 
25 controller circuit services the transmission of C-data 

During the transmission of a C-data block, the V-data buffer 
condition is checked before transmitting the first C-data byte. If the V-data 
buffer contains more than one V-data packet, the current transmission of the 
C-data block will be terminated in order to handle the V-data 
30 Receive Mode 

On the receiving end of the telephone line, the main controller 
circuit 313 operates at the multiplex control level to de-multiplex received 
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data to V-data and C-data The type of block can be identified by checking 
the first byte of the incoming data blocks. Before receiving a block of V- 
rtot? the main controller circuit 313 will initialize a receive V-data byte 
counter, a backup pointer and a temporary V-data buffer pointer. The value 

5 of the receiver V-data byte counter is 24, the value of the receive block 
counter is 0 and the backup pointer is set to the same value as the V-data 
receive buffer pointer. If the received byte is not equal to 80 hex (80h 
indicating a V-data packet), the receive operation will follow the current 
modem protocol since the data block must contain C-data. If the received 

10 byte is equal to 80h, the main controller circuit 313 operating in receive mode 
will process the V-data. 



the byte of V-data is put into the V-data receive buffer, the temporary buffer 
pointer is increased by 1 and the receive V-data counter is decreased by 1. If 

15 the V-data counter is down to zero, the value of the temporary V-data buffer 
pointer is copied into the backup pointer buffer. The value of the total V-data 
counter is added with 24 and the receive V-data counter is reset to 24. The 
value of the receive block counter is increased by 1. A flag to request service 
of V-data is then set. If the receive block counter has reached 5, the main 

20 controller circuit 313 will not put the incoming V-data into the V-data receive 
buffer but throw it away. If the total V-data counter has reached its 
maximum value, the receiver will not put the incoming V-data into the V-data 
receive buffer but throw it away. 



25 CRC check bytes, the main controller circuit 313 operating in the multiplex 
control level will not check the result of the CRC but instead will check the 
value of the receive V-data counter. If the value is zero, the check is 
finished, otherwise the value of the backup pointer is copied back into the 
current V-data buffer pointer. By this method, the receiver is insured to de- 

30 multiplex the V-data from the receiving channel 24 bytes at a time. The main 
controller circuit 313 operating at the service level in the receive mode will 
monitor the flag of request service of V-data If the flag is set, the main 



For a V-data block received, when a byte of V-data is received. 



At the end of the block which is indicated by receipt of the 
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controller circuit 313 will get the V-data from the V-data buffer and transmit 
it to the voice control DSP circuit 306 at a rate of 24 bytes at a time. After 
sending a block of V-data, it decreases 24 from the value in the total V-data 
counter. 

5 Negotiation of Voice Compression Rate 

The modem hardware component 20 incorporates a modified 
packet protocol for negotiation of the speech compression rate. A modified 
supervisory packet is formatted using the same open flag, address, CRC, and 
closing flag formatting bytes which are found in the CCITT V.42 standard 

10 data supervisory packet, as is well known in the industry and as is described 
in the CCITT Blue Book, volume VIII entitled Data Communication over the 
Telephone Network. 1989 referenced above. In the modified packet protocol 
embodiment, the set of CCITT standard header bytes (control words) has been 
extended to include nonstandard control words used to signal transmission of 

15 a nonstandard communication command The use of a nonstandard control 
word does not conflict with other data communication terminals, for example, 
when communicating with a non-PCS (Personal Communications System) 
modem system, since the nonstandard packet will be ignored by a non-PCS 
system. 

20 Table 22 offers one embodiment of the present invention 

showing a modified supervisory packet structure. Table 22 omits the CCITT 
standard formatting bytes: open flag, address, CRC, and closing flag; however, 
these bytes are described in the CCITT standard. The modified supervisory 
packet is distinguished from a V.42 standard packet by using a nonstandard 

25 control word, such as 80 hex, as the header. The nonstandard control word 
does not conflict with V.42 standard communications. 

TABLE 22: Modified 
Supervisory Packet Structure 

30 : : ; 

' 8 Oh ID LI ACK : data :1 data , data 

The modified supervisory packet is transmitted by the HDLC 
protocol in which data is transmitted in synchronous mode and checked by 
35 CRC error checking. The use of a modified supervisory packet eliminates the 
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need for an escape command sent over the telephone line to interrupt data 
communications, providing an independent channel for negotiation of the 
compression rate. The channel may also be used as an alternative means for 
programming standard communications parameters. 

5 The modified supervisory packet is encoded with different 

function codes to provide an independent communications channel between 
hardware components. This provides a means for real time negotiation and 
programming of the voice compression rate during uninterrupted transmission 
of voice data and conventional data without the need for conventional escape 

10 routines. The modified supervisory packet is encoded with a function code 
using several embodiments. For example, in one embodiment, the function 
code is embedded in the packet as one of the data words and is located in a 
predetermined position. In an alternate embodiment, the supervisory packet 
header signals a nonstandard supervisory packet and contains the compression 

15 rate to be used between the sites. In such an embodiment, for example, a 
different nonreserved header is assigned to each function code. These 
embodiments are not limiting and other methods known to those skilled in the 
art may be employed to encode the function code into the modified 
supervisory packet. 

20 Referring once again to Figure 1, a system consisting of PCS 

modem 20 and data terminal 10 are connected via phone line 30 to a second 
PCS system comprised of PCS modem 20A and data terminal 10A. 
Therefore, calling modem 20 initializes communication with receiving modem 
20A. In one embodiment of the present invention, a speech compression 

25 command is sent via a modified supervisory data packet as the request for 
speech compression algorithm and ratio negotiation. Encoded in the speech 
compression command is the particular speech compression algorithm and the 
speech compression ratio desired by the calling PCM modem 20. Several 
methods for encoding the speech compression algorithm and compression 

30 ratio exist. For example, in embodiments where the function code is 

embedded in the header byte, the first data byte of the modified supervisory 
packet could be used to identify the speech compression algorithm using a 
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binary coding scheme (e.g., OOh for Vector Quantization, Olh for CELP+, 02h 
for VCELP, and 03h for TrueSpeech, etc.). A second data byte could be used 
to encode the speech compression ratio (e.g., OOh for 9.6Kbit/sec, Olh for 
16Kbit/sec, 02h for 8Kbit/sec, etc.). This embodiment of the speech 
5 compression command supervisory packet is shown in Table 23. 

TABLE 23: Speech Compression Command Supervisory Packet 

10 ;. 80h ID ;■ LI ■ ACK ; Algthm : CRatio | data != 

Alternatively, as stated above, the function code could be stored 
in a predetermined position of one of the packet data bytes. Other function 
code encoding embodiments are possible without deviating from the scope and 

15 spirit of the present invention and the embodiments offered are not intended 
to be exclusive or limiting embodiments. 

In either case, the receiving PCS modem 20A will recognize 
the speech compression command and will respond with an acknowledge 
packet using, for instance, a header byte such as hex 81. The acknowledge 

20 packet will alert the calling modem 20 that the speech compression algorithm 
and speech compression ratio selected are available by use of the ACK field 
of the supervisory packet shown in Table 23. Receipt of the acknowledge 
supervisory packet causes the calling modem 20 to transmit subsequent voice 
over data information according to the selected speech compression algorithm 

25 and compression ratio. 

The frequency of which the speech compression command 
supervisory packet is transmitted will vary with the application. For moderate 
quality voice over data applications, the speech compression algorithm need 
only be negotiated at the initialization of the phone call. For applications 

30 requiring more fidelity, the speech compression command supervisory packet 
is renegotiated throughout the call to accommodate new parties to the 
communication which have different speech compression algorithm limitations 
or to actively tune the speech compression ratio as the quality of the 
communications link fluctuates. 
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Other embodiments provide a speech compression command 
supervisory packet encode varying transmission rales of the speech 
compression command supervisory packet and different methods of speech 
compression algorithm and compression ratio negotiation. Additionally, other 

5 encoding embodiments to encode the supervisory packet speech compression 
algorithm and the speech compression ratio may be incorporated without 
deviating from the scope and spirit of the present invention, and the described 
embodiments are not exclusive or limiting. 

A new supervisory packet may be allocated for use as a means 

10 for negotiating multiplexing scheme for the various types of information sent 
over the communications link. For example, if voice over data mode is 
employed, there exist several methods for multiplexing the voice and digital 
data. The multiplexing scheme may be selected by using a modified 
supervisory packet, called a multiplex supervisory packet, to negotiate the 

15 selection of multiplexing scheme. 

Similarly, another supervisory packet could be designated for 
remote control of another hardware device. For example, to control the baud 
rate or data format of a remote modem, a remote control supervisory packet 
could be encoded with the necessary selection parameters needed to program 

20 the remote device. 

Those skilled in the art will readily appreciate that there exist 
numerous other unidirectional and bidirectional communication and control 
applications in which the supervisory packet may be used. The examples 
given are not limiting, but are specific embodiments of the present invention 
25 offered for illustrative purposes. 

The present inventions are to be limited only in accordance 
with the scope of the appended claims, since others skilled in the art may 
devise other embodiments still within the limits of the claims. 
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WE CLAIM: 

1 . A communication module for use with a personal computer, 
comprising: 

5 communications interface means connected for communicating to the 

personal computer for transferring data between the personal computer and the 
communications module; 

telephone line interface means for connection to a telephone line; 
voice interface means for receiving local voice signals from a local 
1 0 user and for conveying remote voice signals from a remote user to the local 
user; 

full-duplex conversion means connected to the voice interface means 
for converting the local voice signals into outgoing digital voice data and for 
converting incoming digital voice data into the remote voice signals; 

15 digital signal processor means connected to the full-duplex conversion 

means for compressing the outgoing digital voice data into compressed 
outgoing digital voice data having one of a plurality of selectable compression 
rates and for decompressing compressed incoming digital voice data into the 
incoming digital voice data at one of said plurality of said selectable 

20 compression rates; 

main control means connected few receiving the compressed outgoing 
digital voice data from the digital signal processor means, connected for 
receiving outgoing conventional digital data from the personal computer 
through the communications interface means, and operable for multiplexing 

25 the compressed outgoing digital voice data and the conventional digital data to 
produce multiplexed outgoing data; and 

the main control means further operable for receiving multiplexed 
incoming data which contains incoming conventional digital data multiplexed 
with the compressed incoming digital voice data, for demultiplexing the 

30 incoming conventional digital data and the compressed incoming digital voice 
data, and for sending the incoming conventional digital data to the personal 
computer through the communications interface means and for sending the 
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compressed incoming digital voice data to the digital signal processor means; 
and 

the main control means further operable for negotiating the 
compression rate at various times during the transmission of conventional 
5 digital data multiplexed with the compressed digital voice data to change the 
compression rate. 



2. The module according to claim 1 wherein the digital signal processor 
means is further operable for compressing the outgoing digital voice data into 
10 compressed outgoing digital voice data by performing the steps of: 

a) removing any DC bias in the outgoing digital voice data to produce 
a normalized outgoing digital voice signal; 

b) pre-emphasizing the normalized outgoing digital voice signal to 
produce a pre-emphasized outgoing digital voice signal; 

15 c) dividing the pre-emphasized outgoing digital voice signal into 

segments to produce a current segment and a past segment; 

d) predicting the pitch of the current speech segment to form a pitch 
prediction; 

e) calculating the gain of the pitch of the current speech segment to 
20 form a prediction gain; 

f) reconstructing the past speech segment from a compressed past 
segment to produce a reconstructed past segment; 

g) finding the innovation in the current speech segment by comparing 
the pitch prediction to the reconstructed past segment to produce an error 

25 signal; 

h) determining the maximum amplitude in the current speech segment; 

i) quantizing the error signal using one of a plurality of code books, 
each code book corresponding to a different compression rate,the code books 
being generated from a representative set of speakers and environments to 

30 produce a minimum mean squared error matching the form of an index into 
said one of said plurality of code book; and 
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j) recording the pitch prediction, the prediction gain, the maximum 
amplitude and the index into the selected code book in a packet as the 
compressed outgoing digital voice data 



S 3. The module according to claim 1 wherein the digital signal processor 
means is further operable for detecting silent periods in the outgoing digital 
voice data and for producing in response thereto a silence flag and wherein 
the main control means is further operable for transmitting outgoing 
conventional digital data on the telephone line when the silence flag indicates 

10 the absence of voice information and wherein the main control means is 
further operable for multiplexing and transmitting both the compressed 
outgoing digital voice data and the outgoing conventional digital data on the 
telephone line when the silence flag indicates the presence of voice 
information. 

15 

4. A system for performing voice compression, comprising: 

voice interface means including a voice input device for receiving 

voice signals from a user; 

conversion means for converting the voice signals into digital voice 

20 data; 

means for dividing the digital voice signal into segments and for 
serially producing therefrom a current voice segment and a past voice 
segment; 

means for determining the predicted gain of the current voice segment; 
25 means for determining the pitch values of the current voice segment; 

coding means for predictive coding on the current voice segment by 
predicting the current voice segment as best as possible based on past 
recreated voice segments and for producing a difference signal value in 
response thereto; 

30 means for selecting a compression rate for the speech compression; 

means including a plurality of code books stored in a memory for 
comparing the difference signal value to stored difference values stored one of 
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10 



15 



20 



the plurality od code books stored in the memory and for locating the memory 
address of the closest match between the difference signal value and the 
stored difference values; 

output means for providing a compression packet for each of the 
segments, the packet including the memory address of the closest match 
between the difference signal value and the stored difference values, the 
predicted gain and the pitch values for each voice segment; and 

output means further for providing a negotiation packet sent to 
determine the compression rate. 

5. A method for compression speech information, comprising the steps 



selecting a compression rate; 

receiving a speech signal from a person, sampling and digitizing the 
speech signal and dividing the digitized speech signal into a continuous 
stream of speech segments having a plurality of digital samples; 

selecting a current speech segment from the continuous stream of 
speech segments and reconstructing a past speech segment from a previously 
compressed past segment to produce a reconstructed past segment; 

comparing the current segment on a sample by sample basis to the 
reconstructed past segment to obtain innovation information and an indicator 
of the error in the comparing; and 

storing the innovation information and the indicator of the error in the 
comparing in a data packet as a compressed data packet. 

6. The method according to claim 5 wherein the step of comparing 
includes the following step: 

determining the predicted gain of the current speech segment; 

determining the pitch values of the current speech segment; 

predictive coding the current speech segment by predicting the current 
speech segment as best as possible based on the recreated past speech 
segments and for producing a difference signal value in response thereto; and 
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comparing the difference signal value to stored difference values stored 
in a memory and for locating the memory address of the closest match 
between the difference signal value and the stored difference values. 
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